12 research outputs found

    NSDL EduPak: An Open Source Education Repository Solution

    Get PDF
    4th International Conference on Open RepositoriesThis presentation was part of the session : Conference PostersEducational organizations and institutions focused on establishing specialized digital collections, conducting educational research, or providing students, teachers and instructors with discipline-oriented pedagogical products and tools require basic technology to begin building educational digital repositories. To help meet these needs, the National Science Digital Library (NSDL) has announced the release of NSDL EduPak. Specifically designed for education, NSDL EduPak packages technology for digital storage, access, and workflow into a convenient bundle. This poster reviews three core EduPak components with examples of how they are used by education communities.National Science Foundatio

    BIOZON: a hub of heterogeneous biological data

    Get PDF
    Biological entities are strongly related and mutually dependent on each other. Therefore, there is a growing need to corroborate and integrate data from different resources and aspects of biological systems in order to analyze them effectively. Biozon is a unified biological database that integrates heterogeneous data types such as proteins, structures, domain families, protein–protein interactions and cellular pathways, and establishes the relationships between them. All data are integrated on to a single graph schema centered around the non-redundant set of biological objects that are shared by each source. This integration results in a highly connected graph structure that provides a more complete picture of the known context of a given object that cannot be determined from any one source. Currently, Biozon integrates roughly 2 million protein sequences, 42 million DNA or RNA sequences, 32 000 protein structures, 150 000 interactions and more from sources such as GenBank, UniProt, Protein Data Bank (PDB) and BIND. Biozon augments source data with locally derived data such as 5 billion pairwise protein alignments and 8 million structural alignments. The user may form complex cross-type queries on the graph structure, add similarity relations to form fuzzy queries and rank the results based on analysis of the edge structure similar to Google PageRank, online at

    BIOZON: a system for unification, management and analysis of heterogeneous biological data

    Get PDF
    BACKGROUND: Integration of heterogeneous data types is a challenging problem, especially in biology, where the number of databases and data types increase rapidly. Amongst the problems that one has to face are integrity, consistency, redundancy, connectivity, expressiveness and updatability. DESCRIPTION: Here we present a system (Biozon) that addresses these problems, and offers biologists a new knowledge resource to navigate through and explore. Biozon unifies multiple biological databases consisting of a variety of data types (such as DNA sequences, proteins, interactions and cellular pathways). It is fundamentally different from previous efforts as it uses a single extensive and tightly connected graph schema wrapped with hierarchical ontology of documents and relations. Beyond warehousing existing data, Biozon computes and stores novel derived data, such as similarity relationships and functional predictions. The integration of similarity data allows propagation of knowledge through inference and fuzzy searches. Sophisticated methods of query that span multiple data types were implemented and first-of-a-kind biological ranking systems were explored and integrated. CONCLUSION: The Biozon system is an extensive knowledge resource of heterogeneous biological data. Currently, it holds more than 100 million biological documents and 6.5 billion relations between them. The database is accessible through an advanced web interface that supports complex queries, "fuzzy" searches, data materialization and more, online at

    Re-thinking Fedora's storage layer: A new high-level interface to remove old assumptions and allow novel use cases

    Get PDF
    Traditionally, the pluggable storage interface in Fedora has followed a "low-level" paradigm where objects and datastreams are presented to the storage layer as independent, anonymous blobs of data. This arrangement has proven simple, reliable, and generally flexible. In the past few years however, there has been an increasing need for Fedora to mediate storage in more complex scenarios. Managing large numbers of big datastreams, multiplexing storage between different devices or cloud storage, and archiving content in a transparent manner are tasks that are difficult to achieve through Fedora currently

    Re-thinking Fedora's storage layer: A new high-level interface to remove old assumptions and allow novel use cases

    Get PDF
    Traditionally, the pluggable storage interface in Fedora has followed a "low-level" paradigm where objects and datastreams are presented to the storage layer as independent, anonymous blobs of data. This arrangement has proven simple, reliable, and generally flexible. In the past few years however, there has been an increasing need for Fedora to mediate storage in more complex scenarios. Managing large numbers of big datastreams, multiplexing storage between different devices or cloud storage, and archiving content in a transparent manner are tasks that are difficult to achieve through Fedora currently

    Correcting BLAST e-values for low-complexity segments

    Full text link
    The statistical estimates of BLAST and PSI-BLAST are of extreme importance to determine the biological relevance of sequence matches. While being very effective in evaluating most matches, these estimates usually overestimate the significance of matches in the presence of low complexity segments. In this paper we present a model, based on divergence measures and statistics of the alignment structure, that corrects BLAST e-values for low complexity sequences without filtering or excluding them. We evaluate our method and compare it to other known methods using the Gene Ontology (GO)knowledge resource as a benchmark. Various performance measures, including ROC analysis, indicate that the new model improves over the state of the art. The program is available at biozon.org/ftp/ and www.cs.technion.ac.il/~itaish/lowcomp

    Using RMap to Describe Distributed Works as Linked Data Graphs: Paper - iPRES 2016 - Swiss National Library, Bern

    No full text
    Today's scholarly works can be dynamic, distributed, and complex. They can consist of multiple related components (article, dataset, software, multimedia, webpage, etc.) that are made available asynchronously, assigned a range of identifiers, and stored in different repositories with uneven preservation policies. A lot of progress has been made to simplify the process of sharing the components of these new forms of scholarly output and to improve the methods of preserving diverse formats. As the complexity of a scholarly works grows, however, it becomes unlikely that all of the components will become available at the same time, be accessible through a single repository, or even stay in the same state as they were at the time of publication. In turn, it also becomes more challenging to maintain a comprehensive and current perspective on what the complete work consists of and where all of the components can be found. It is this challenge that makes it valuable to also capture and preserve the map of relationships amongst these distributed resources. The goal of the RMap project was to build a prototype service that can capture and preserve the maps of relationships found amongst these distributed works. The outcomes of the RMap project and its possible applications for preservation are described
    corecore